Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complex geocoding config [MAP-706] [MAP-708] [MAP-707] [MAP-458] #166

Merged
merged 89 commits into from
Jan 7, 2025

Conversation

janbaykara
Copy link
Member

@janbaykara janbaykara commented Dec 17, 2024

This PR adds a new, more capable geocoding option that can involve multiple fields.

  • geography_config is a new approach to geocoding. When the field is set (see test cases), it allows three new approaches to geocoding:
    1. Multi-field area geocoding. It can be a single area name, or it could be multiple columns (like council -> ward). A good example of this is the Britain Elects ward-level GE2024 results forecast.
    2. Address geocoding with per-row prefix / suffix / country configurations. A good example of this is a spreadsheet with business names and street names. Both fields are important to get precise coordinates, and adding an arbitrary suffix like "Glasgow" gives the geocoder the context it needs to make the right choices.
    3. Coordinate geocoding. You pick a latitude and longitude field, and the rest is history. This was an explicit request from Tipping Point UK.
  • There is a full test suite for these geocoding configurations, and they run on the Github CI test runner as a follow-up step.
  • As part of area geocoding, we now import inactive political boundaries from MapIt, taking the row count from 9000~ to over 20,000.
  • There are a few cases where we use a new service, findthatpostcode.uk, for backup postcode geocoding when postcodes.io fails.
  • When geography_config is set, geocoding cannot be set differently by the user in the UI. So there are some JSX changes for that.

@commonknowledge-bot commonknowledge-bot deployed to feature/map-668-hnh-ward-csv-issues - meep-database PR #166 December 17, 2024 12:32 — with Render Active
@janbaykara janbaykara changed the title Feature/map 668 hnh ward csv issues Complex geocoding config Dec 17, 2024
Copy link

sentry-io bot commented Dec 17, 2024

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: hub/management/commands/import_areas.py

Function Unhandled Issue
handle BadRequestException: Usage limit reached. utils.m...
Event Count: 1

Did you find this useful? React with a 👍 or 👎

@commonknowledge-bot commonknowledge-bot temporarily deployed to feature/map-668-hnh-ward-csv-issues - meep-intelligence-hub-backend PR #166 December 17, 2024 12:33 — with Render Destroyed
@commonknowledge-bot commonknowledge-bot temporarily deployed to feature/map-668-hnh-ward-csv-issues - meep-intelligence-hub-backend PR #166 December 17, 2024 12:33 — with Render Destroyed
@commonknowledge-bot commonknowledge-bot temporarily deployed to feature/map-668-hnh-ward-csv-issues - meep-intelligence-hub-frontend PR #166 December 17, 2024 12:33 — with Render Destroyed
@commonknowledge-bot commonknowledge-bot deployed to feature/map-668-hnh-ward-csv-issues - meep-intelligence-hub-worker PR #166 December 17, 2024 12:33 — with Render Active
Copy link

@joaquimds joaquimds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I just had a few minor comments.

hub/data_imports/geocoding_config.py Show resolved Hide resolved
hub/data_imports/geocoding_config.py Show resolved Hide resolved
if any([job.status == "doing" for job in jobs]):
status = "doing"
elif any([job.status == "failed" for job in jobs]):
status = "failed"
elif all([job.status == "succeeded" for job in jobs]):
status = "succeeded"
elif number_of_jobs_ahead_in_queue <= 0:
status = "succeeded"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what's the logic here?

Copy link
Member Author

@janbaykara janbaykara Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GraphQL query was responding with status succeeded even if you had cancelled some of the jobs. So I added this logic: if there are no more jobs, report that the status is done. There wasn't a done status, so I settled for succeeded because I'm an optimist starting 2025.

@@ -1836,7 +1841,7 @@ async def update_many(self, mapped_records: list[MappedMember], **kwargs):
"Update many not implemented for this data source type."
)

def get_record_id(self, record):
def get_record_id(self, record) -> Optional[Union[str, int]]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: lots of things break if get_record_id returns None. Can we make this non-optional? if so, what do we return instead of None?

Copy link
Member Author

@janbaykara janbaykara Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return a false ID without that causing genuine problems? That seems like a dodgy answer to me. Instead I think we should handle the case.

The case here is that I added a Google Sheet with an empty ID cell, so Django did return a None. I didn't change the behaviour, I just specified typing that reflected reality.

You might be correct but I think that this is a separate ticket to solve the problem of "what happens when data sources return a None ID?" Because this is already a live problem in the Mapped system.

Copy link
Member Author

@janbaykara janbaykara Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do some heuristic work to generate our own IDs, but then what if the ID in the system changes? Then suddenly we'll have duplicate data in Mapped.

I think the best approach is to keep get_record_id honest and then deal with None IDs with the principle that we can't use them, basically. New ticket for this: https://linear.app/commonknowledge/issue/MAP-767/gracefully-handle-reject-records-with-a-none-id

"ward": "Credenhill",
"expected_area_type_code": "WD23",
"expected_area_gss": "E05012957",
},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: have a "X city council" vs "X council" case here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow, can you expand?

hub/tests/fixtures/geocoding_cases.py Outdated Show resolved Hide resolved
@commonknowledge-bot commonknowledge-bot temporarily deployed to feature/map-668-hnh-ward-csv-issues - meep-intelligence-hub-backend PR #166 January 7, 2025 17:18 — with Render Destroyed
@commonknowledge-bot commonknowledge-bot temporarily deployed to feature/map-668-hnh-ward-csv-issues - meep-intelligence-hub-worker PR #166 January 7, 2025 17:19 — with Render Destroyed
@commonknowledge-bot commonknowledge-bot temporarily deployed to feature/map-668-hnh-ward-csv-issues - meep-intelligence-hub-frontend PR #166 January 7, 2025 17:19 — with Render Destroyed
@commonknowledge-bot commonknowledge-bot temporarily deployed to feature/map-668-hnh-ward-csv-issues - meep-intelligence-hub-backend PR #166 January 7, 2025 17:58 — with Render Destroyed
@commonknowledge-bot commonknowledge-bot temporarily deployed to feature/map-668-hnh-ward-csv-issues - meep-intelligence-hub-worker PR #166 January 7, 2025 17:58 — with Render Destroyed
@janbaykara janbaykara merged commit 4a3bf76 into main Jan 7, 2025
4 checks passed
@janbaykara janbaykara deleted the feature/map-668-hnh-ward-csv-issues branch January 7, 2025 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants